Hashing Graphs Trees Search Trees Indexing and Multiway Trees File Organization

Introduction

The Table abstract data type

Implementations of the table data structure

Hash Tables

Bucket Array

Hash Function

Hash Code

Compression Functions

Collision-Handling Schemes

Collision likelihoods and load factors for hash tables

Hash Table Implementation

A simple Hash Table in operation

Strategies for dealing with collisions

Linear Probing

Double Hashing

Complexity of hash tables

Hash collisions occur when two different keys produce the same hash value. To handle these collisions, various techniques are employed. Let's explore three standard approaches: Buckets, Direct Chaining, and Open Addressing.


1. Buckets:


Imagine a scenario where we allocate a two-dimensional array and use each column as a bucket. Keys that produce the same hash value are placed in the same column. For example, let's say we have airports and their corresponding codes:


0 1 2 3 4 5 6 7 8 9 10 LAX PHL FRA GCM AKL ORY DCA HKG GLA


Here, keys like PHL and HKG fall into the same bucket. However, this method requires more space than necessary and becomes slower as the table fills up.


2. Direct Chaining:


In this method, instead of reserving entire sub-arrays for collided keys, we create linked lists for each key. For instance:


LAX DCA PHL → FRA → GCM HKG AKL → ORY GLA

Here, keys are stored in linked lists, which eliminates the need for reserving excessive space. However, finding a particular key requires traversing lists.


Calculating Average List Size:


We can compute the average size of non-empty lists using probabilities. With 'n' items in an array of size 'm', the probability of no items landing in a slot is calculated. From this, we can derive the average number of items in a non-empty list. Linear search for an item in a list of size 'k' takes, on average, 'k + 1 / 2' comparisons.


3. Open Addressing:


This approach involves finding another open location for collided entries. Two common strategies are linear probing and double hashing.

Linear Probing: If a collision occurs, the algorithm searches for the next available slot by decrementing the index until an empty space is found.
Double Hashing: This method involves using a secondary hash function to find an empty location, which mitigates clustering issues associated with linear probing.


Comparison and Analysis:


Space Efficiency: Direct chaining saves space by not reserving entire buckets but relies on linked lists, which can be memory-intensive.

Time Complexity: Open addressing provides constant-time complexity for insertion and search operations, making it efficient for large datasets.

Traversal: Direct chaining and buckets require traversal of linked lists or columns, impacting efficiency, whereas open addressing methods provide faster traversal.

Adaptability: Open addressing allows for dynamic resizing of the hash table to maintain optimal load factors, ensuring efficient performance over time.


In conclusion, each collision resolution technique has its advantages and drawbacks. Understanding the trade-offs between space efficiency, time complexity, and adaptability is crucial in selecting the appropriate approach based on the specific requirements of the application.

Hash Code:


A hash code is a unique numeric value assigned to each key in a hash map. It acts as a fingerprint for the key, helping in quick identification and retrieval of associated data. Though not necessarily limited to a specific range, hash codes should aim to minimize collisions, where different keys produce the same hash code. Ensuring consistency, equal keys must yield the same hash code. Essentially, a hash code serves as a compact representation of a key, facilitating efficient storage and retrieval operations in hash-based data structures.